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Abstract 


Often the regression function appearing in fields like economics, engineering, biomedical sciences 
obeys a system of higher order ordinary differential equations (ODEs). The equations are usually not 
analytically solvable. We are interested in inferring on the unknown parameters appearing in the equa- 
tions. Significant amount of work has been done on parameter estimation in first order ODE models. 


Bhaumik and Ghosall (l2014a[) considered a two-step Bayesian approach by putting a finite random series 


prior on the regression function using B-spline basis. The posterior distribution of the parameter vector is 
induced from that of the regression func tion. Although this approach is computationally fast, the Bayes 


estimator is not asymptotically efficient. 


Bhaumik and Ghosall (l2014bl) remedied this by directly consid¬ 


ering the distance between the function in the nonparametric model and a Runge-Kutta (RK4) approxi¬ 
mate solution of the ODE while inducing the posterior distribution on the parameter. They also studied 
the direct Bayesian method obtained from the approximate likelihood obtained by the RK4 method. In 
this paper we extend these ideas for the higher order ODE model and establish Bernstein-von Mises 
theorems for the posterior distribution of the parameter vector for each method with contraction 

rate. 


Keywords: Approximate likelihood, Bayesian inference, Bernstein-von Mises theorem, higher order 
ordinary differential equation, Runge-Kutta method, spline smoothing. 

1 Introduction 

Consider a regression model Y = fg{x) + e with unknown parameter 0 G © C RP and x G [0,1]. 
The functional form of / 0 (-) is not known but fe{-) satisfy a order ordinary differential equation (ODE) 
given by 


F 


tJeit), 


dfejt) 

dt 


dt<i 



( 1 ) 
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where F is a known sufficiently smooth real-valued function in its arguments which we shall refer to as the 
binding function. Higher order ODE models are encountered in different fields. For example, the system of 
ODE describing the concentrations of glucose and hormone in blood is 


dg{t) 

dt 

dh{t) 

dt 


-mig{t) - m2h{t) + J{t), 


( 2 ) 

( 3 ) 


where g{t) and h{t) denote the glucose and hormone concentrations at time t respectively. Here the function 
J{t) is known and mi, m2, m3 and 1714 are unknown parameters. If we have only measurements on g{t), 
we can differentiate both sides of Q to obtain the second order ODE 

+2a« + o.„Vt) = S(t), 


dt'^ 


dt 


where a = (mi +m3)/2, cjq = mim^ + m2m4 and S{t) = m^J{t) + dJ{t)/dt. Another popular example 
is the Van der Pol oscillator used in physical and biological sciences. The oscillator obeys the second order 
ODE 

A related problem is a stochastic differential equation model whe re a signal i s continuously obs erved 


in time with a noise process typically driven by a Brownian motion. 


BergstromI (11983 


1985 


1986h used 


the maximum likelihood estimation (MEE) technique to estimate the parameters involved in higher order 
stochastic differential equation given by 


= mo-1 + 


dt^ 


dt^ 


+ ^ + Aqy{t) + b{6) + z{t) + W(t), 


where Aj(-),j = 1 ,..., q and b{-) are functions on 0 and lE(-) i s the noise process (IKarlin and Tavlon 


1981 


page 342 ). Here z{-) is a non-random function. 


BergstromI (Il983h showed that the maximum like¬ 


lihood e stimator of 6 is a symptotically normal and asymptotically efficient. An efficient algorithm was 


given m 


BergstromI (Il985h to compute the Gaussian likelihood for estimating the parameters involved in a 


non-stationary higher order stochastic ODE. Appropriate linear transformations are used in this algorithm 
to avoid the computation of the covariance matrix of the observations. 

In this paper we develop three Bayesian approaches for inference on 6. In our first approach we use 
Runge-Kutta method to obtain an approximate solution using grid points, n being the number of 

observations and then construct an approximate likelihood and obtain the posterior distribution of 6, using 
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the prior of 6. In another approach we assign prior on the coefficient vector (3 of the B-spline approximation 
of the regression function. We define 6 as arg min^jg© {f{t, (3) — fr],rn (^))^ 9 {t)dt and induce a 
posterior distribution of 6 using the posterior distribution of (3. Here ( 7 (-) is an appropriate weight function. 
The third approach is a generalization of the two-step approach. We use the B-spline approximation of the 
regression function and define 


6 = arg min / 
rje© Jq 






-,v 


w{t)dt, 


dt dt^ 

where fhe weight function and its first {q — 1) derivatives vanish at 0 and 1. For the sake of simplicity we 
have assumed the regression function to be one dimensional. Extension to the multidimensional case where 
the binding function F is also vector-valued, can be carried out similarly. 

The rest of the paper is organized as follows. The notations are described in Section|2] Sectionj^contains 
some preliminaries of Runge-Kutta method. The model assumptions and prior specifications are given in 
SectionHl Section[5]contains the descriptions of the estimation methods used. The main results are given in 
Section!^ In Section |7] we have carried out a simulation study. Proofs of the results are given in Section [ 8 ] 
2 Notations and preliminaries 

We describe a set of notations to be used in this paper. Boldfaced letters are used to denote vectors and 
matrices. For a matrix A, the symbols and Aj stand for the row and column of A respectively. 
The identity matrix of order p is denoted by Ip. We use the symbols maxeig(A) and mineig(A) to denote 
the maximum and minimum eigenvalues of the matrix A respectively. The L 2 norm of a vector x S 

is given by ||®|| = For a function (p{-) and a vector x e we denote Dx<p = 

'§x^ ~ (^■’ ■ ■ ■ ’ ^■) • notation stands for the order derivative of a function /(•), that 

is, For the function 6 1 —)■ the notation fe{-) implies Similarly, we 

denote fei-) = A vector valued function is represented by the boldfaced symbol /(•). Fet us 

define \\f\\g = (/J \\f{t)\\‘^g{t)dty/‘^ for / : [0,1] W and 5 : [0,1] 1 —)• [0,oo). The weighted inner 
product Jq fi'{t)f 2 it)g{t)dt of two vector-valued functions /i(-) and / 2 (-) with the corresponding weight 
function g{-) is denoted by {fi,f 2 )g- For numerical sequences an and bn, an = o{bn), bn S> an and 
o-n ^ bn all mean an/bn —)• 0 as n — 00 . The notations an = 0{bn), an < bn are used to indicate 
that an/bn is bounded, and an x bn refers to the occurrence of both an = 0{bn) and bn = 0{an)- 
The symbol op(l) stands for a sequence of random variables converging in P-probability to zero, whereas 
Op(l) stands for a stochastically bounded sequence of random variables. Given a sample {Xi : i = 
1,... ,n} and a measurable function ^e define ¥ni/ = n~^ Y/a=i The symbols E(-) and 
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Var(-) stand for the mean and variance respectively of a random variable. We use the notation GnV’ to 
denote y/rK^nijj — E^')- The total variation distance between the probability measures P and Q defined 
on RP is given by ||P — Q\\tv = sup^g^p \P{B) — Q{B)\, M'p being the Borel a-field on RP. Given a 
set E, the symbol C^(E) stands for the class of functions defined on E having first m continuous partial 
derivatives with respect to its arguments on some open set containing E. For a set A, the notation 1{A} 
stands for the indicator function for belonging to A. The symbol := means equality by definition. 

3 Preliminaries of Runge-Kutta method for higher order ODE 

Often the differential equation has the form 




^ (t), ■ ■ ■, = 0 


with initial conditions fg’^\o) = for = 0,q — 1, H being known. Note that t can be treated as 
a state variable x(f) = t which satisfies the order ODE = 0 with initial conditions x(0) = 

0, = 1 andx^^Ho) = 0 for j = 2 ,..., g - 1 . Denoting V> 0 (-) = {fe{-),xi-)), we can rewrite the 

ODE as 


where H = (Ef(-),0). Given equispaced grid points oi = 0 ,a2, ■ ■ ■ ,ar„ with common difference 
r~^, the approximate solution to ([B is given by ipg^rni') = (/ 0 ,r„('))Xr„('))> where is chosen so that 
rn ^ y/n. Here n denotes the number of observations. Eet = {'tpe,rn{<^k),'fp^gl^{ak), ■ ■ ■ j(«fc)) 
stand for the vector formed by the function ' 00 ,and its (g — 1 ) derivatives at the grid point Uk for 
k = I,... ,rn- Eor i/ = 1, 2,... , g — 1 we define 


T’'{ak,Zk,rn) := (ofc) + + ''' + 


^e,r. 


1 ■(^^+ 1 ), 


2 !r„ 


rn "" ^\q-k')l 


'^e,r!h^k), 


T'^{ak,Zk,rn) := 0 . 


Now let kp{ak) := H(U^,U ‘^,..., U^) with C/^, ... ,U'^ being given in Tabled 


“TabledJhere” 


Eollowing equation (4.16) of 


Henricil (119621. page 169) we define 


^ •— T i,0^k■> ^k 1 r 


1 

rn~''\q - + 1 )! 


4 

^ ^ lvpkp{a,k\ 

p=i 
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where the coefficients are given by 


Ivl = 

lv2 = IvZ = 

lui = 


[q-v + lf 


{q-i' + 2){q - + 3) ’ 

2(g — 1/ + 1) 

{q - y + 2){q - V + ?,)' 
l-q + y 


for u = 1, 


{q-y + 2){q -V + 2,) 

, q. Then the sequence Zk, k = 1,..., can be constructed by the recurrence relation 

Zk+i = Zk + r-^ {^\ak,Zk,rn),...,^'^{ak,Zk,rn))'^ ■ 


By the proof of Theorem 4.2 of lHenricil (119621 page 174), we have 


sup |/e(t) - /0,r„(f)| = sup 

ze[0,l] i6[0,l] 




= OirJ). 


4 Model description and prior specification 

Now we formally describe the model. The proposed model is given by 


(4) 


Yi = f 0 {Xi) + Si, i = 1,. .. ,n, (5) 

where 9 Y &, which is a compact subset of . The function fei-) is q times differentiable on an open set 
containing [0,1] and satisfies the system of ODE given by 

Let for a fixed 6, F{-, •, 6) G 1) x for some integer m > q. Then, by successive differ¬ 

entiation we have fe G C'™((0,1)). We also assume that the function 9 i—)• fei-) is two times continuously 
differentiable. The true regression function is given by /o(-) which does not necessarily lie in {fg : 9 G 0}. 
Moreover we assume that /o G C"^([0,1]). Let e* be independently and identically distributed with mean 
zero and finite moment generating function for f = 1, ... ,n. Let the common variance be cjq. We use 
A^(0, cj^) as the working model for the error, which may be different from the true distribution of the errors. 
We treat as an unknown parameter and assign an inverse gamma prior on cr^ with shape and scale param¬ 
eters a and b respectively. Additionally it is assumed that A* G with density g. 

Let us denote Y = (Yi,..., Yn)'^ and X = {Xi ,..., The true joint distribution of (Aj, ej) is 

denoted by Pq. 
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5 Methodology 

Now we describe the three different approaches of inference on 9 used in this paper. 

5.1 Runge-Kutta Sieve Bayesian Method (RKSB) 

For RKSB we denote 7 = (0, cr^). The approximate likelihood of the sample {(Xj, KjJ : z = 1,..., n} 
is given by = nr=i where 

p^,nA^A^) = {V^a)-Axp{-{2A)AYi - fe,AY^A}9{X,). (7) 

We also denote 


p^{Xi,Yi) = (V^cr) ^exp{-(2cr^) ^\Yi - f 0 {XiA}g{Xi). (8) 


The true parameter 70 := {9o,A) is defined as 70 = arg max^g©x(o,oo) ^0 logp-y, which takes into 
account the natural requirement that if errors are normally distributed and fe^i-) is the true regression func¬ 
tion, then 7 o = (0O) Cg), where Uq is the true value of the error variance. We denote by £-y and the 
log-likelihoods with respect to dU) and (17]l respectively. If 70 is the unique maximizer of Pg logp-y, we get 


f 


feoit) (Mt) - feo{t))g{t)dt = 0 ,a^=aQ+ / |/o(t) - feA)\ 9 {t)dt. 

Jo 


We assume that the sub-matrix of the Hessian matrix of —Pq logp-y at 7 = 70 given by 



feo^t)feo{t) - ^ (/e (^) {fo{t) - foo 


e=eoJ 


9 {t)dt 


(9) 


( 10 ) 


is positive definite. Note that this condition is automatically satisfied when feA) ’^he frue regression 
function. The prior measure on 0 is assumed fo have a Lebesgue-densify confinuous and positive on a 
neighborhood of 6 q. The prior disfribufion of 6 is assumed fo be independenf of fhaf of cr^. The joinf prior 
measure is denofed by H wifh corresponding densify vr. We obfain fhe posferior of 7 using fhe approximafe 
likelihood given by (lV]l. 

5.2 Runge-Kutta Two-step Bayesian Method (RKTB) 

In the RKTB approach, the proposed model is embedded in the nonparametric regression model 


Y = XnP + e, 


( 11 ) 


where Xn 
with kn — 


= {{Nj{Xi)))i<i<n,i<j<kr,+m-i, ^ being the B-spline basis functions of order m 

1 interior knots, see Chapter IX of 


De Boon (Il978h . We assume for a given cr^ 


P ~ Nk„+m-l{0,AAkAlk„+m-l)- 


( 12 ) 
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A simple calculation yields that the conditional posterior distribution of (3 given is 

(^{X^Xn + n-^knIk^+^-i)~"x^Y,a^{X^Xn + n-^knIk„+m-iy') • (13) 

For a given parameter r; G 0 let 

Rf,niv) = , Rfoiv) = l/o(i) - , 

where /(f,/3) = f3'^N{t) with N{-) = ... ,Nk^j^m-i{-)Y'■ Now we define the parameter 6 by 

9 = argmin^g© i?y ^(r/) as any minimizer and induce posterior distribution on 0 through the posterior of 
(3 given by ([T3l) . Thus we extend the definition of 6 beyond the differential equation model. Let us denote 
00 = argminjjg© Rf^{r]). Note that in well-specified case when fe^i-) is the true regression function with 
corresponding true parameter 6 q, then the minima is automatically located at 0 o- We assume that 


foralle>0, inf > Rfg{6o), (14) 

n:||n-®o||>e 

that is, Rf^{-) has a well separated unique minima at some point Oq. 

5.3 Two-step Bayesian Method 

Here we use the same nonparametric model and prior specifications as in RKTB, but define the true 
parameter 0o as the unique minimizer of 77 i—> \\F{-, fo{-), (•)) ■ ■ ■ j where w{-) is a non¬ 

negative sufficiently smooth weight function and w{-) as well as its first {q — 1 ) derivatives vanish at 0 and 
1. Here 6 is defined as 


9 = arg min 
rje© 


where f{-,l3) = l3'^N{-) and (3) = /3) for r = 1,..., q. We also assume that for all e > 0 


inf 

n:||n-®oll>e 


F (•, /o(-), ^ (•), ■ ■ ■, 0 ) IL > 11^ (•> /o(-), ^(0, • ■ •, 0o) 


(15) 


which implies that 0 o is a well separated point of minima of 

6 Main results 

Now we state the theoretical results corresponding to each of the three approaches. 

6.1 RKSB 

Theorem 1. Let the posterior probability measure related to RKSB be denoted by H^. Then posterior of'j 
contracts at 70 at the rate and 

Po 


|nn (V^( 7 - 7 o) G ■\X,Y)-N{p.n,'S)\\Tv^^ 
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where X) = 


0 


with 


V 0 2at 


Veo = 


d 


feoit)f 0 oit) - [fe (t) {fo{t) - foo 


0=00 


g{t)dt 


Grid fin — ’^^n^yo^n- 

Since 0 is a sub-vector of 7 , we get the Bernstein-von Mises Theorem for the posterior distribution of 
^/n{0 — Oq), the mean and dispersion matrix of the limiting Gaussian distribution being the corresponding 
sub-vector and sub-matrix of fin and X) respectively. We also get the following important corollary. 
Corollary 1. When the regression model ([5]l is correctly specified and also the true distribution of error is 
Gaussian, the Bayes estimator based on II^ is asymptotically efficient. 

Remark 1. We can get similar results when the covariates are deterministic under the criteria that 


sup \Qn{t) - Q{t)\ = o{n 
i 6 [ 0 ,l] 

where Qnf) is the empirical distribution function of the covariate sample and ( 5 (-) is a distribution function 
with positive pdf on [ 0 , 1 ]. 

6.2 RKTB 


In RKTB we assume that the matrix 

= ~ f0oit){fo{t) - f 0 fit))g{t)dt + (/ 0 o(i)) (/eo w) 9 {t)dt 

is nonsingular. Note that in the well-specified case the first term vanishes and hence J{6q) equals the second 
term which is positive definite. Let us denote C(f) = {J{6q))' ^ (/0oW) andG^ = /J C{t)N^{t)g{t)dt. 
Also, we denote the posterior probability measure of RKTB by If*. Now we have the following result. 

Theorem 2. Let 


fi*n = V^Gl{X^Xn) ^X^Y-V^{J{eo))-^ Ut)g{t), 

K = nGliX^Xny^Gn, 

B = . 


If B is non-singular, then for m >3 and kn ^ 


n; {MO - Oo) ^■\x,y)-n {fii, = 


opo(l)- 


(16) 



Remark 1. Following the steps of the proof of Lemma 10 of 


Rhaumik and Ghosall (l2014hh it can be proved 


that both /r* and 51* are stochastically bounded. Hence, with high true probability the posterior distribution 
of 6 contracts at Oq at rate. 

We also get the following important corollary. 

Corollary 2. When the regression model ([5]l is correctly specified and the true distribution of error is 
Gaussian, the Bayes estimator based on H* is asymptotically efficient. 

Remark 3. Similar results will follow for deterministic covariates provided that 


sup \Qn{t) - Q{t)\ = o{k^^), 
te[o,i] 

where Qn{-) is the empirical distribution function of the covariate sample and (5(-) is a distribution function 
with positive pdf on [0,1]. Note that this condition holds with probability tending to one when the covariates 
are random. 

6.3 Two-step Bayesian Method 

We denote h{-) = (/(•, /3), (•, /3),..., (•, /3))^ and h,o(-) to be similar to h{-) with / being re¬ 

placed by /q. We denote G{t, h{t), 6) = {DgF{t, h{t),0))'^ F{t, h{t), G). Before obtaining the Bemstein- 
von Mises Theorem for two-step Bayesian method, we use the following lemma to get an approximate 
linearization of ^/n{6 — Oq). 

Lemma 1. Let the matrix 


M{ho,6o)= [ Deo{G{t,ho{t),Go))w{t)dt 
Jo 

be nonsingular. For m > {2q -|- 2), kn ^ 7 t,i/( 4 <?+ 4 ) under assumption (I15I ). there exists 

EnG&x C'™'((0,1)) with Yi{E‘f\Y) = opq(1), such that 


sup 


^{G - 0o) - (M(ho, 0o))"' v^(r(/) - r(/o)) 


(17) 


where r( 2 :) := — Ylt=o lo i~^y 'W ^o(O) ^o)) w{t)] ^ z{t)dt is a linear functional of z{-) for 

any function z : [ 0 , 1 ] i—>• R. 

Denoting A{t) = - (iVT(/io, 0o))"^ Er=o(“l)’'S^ l^ho {G{t,ho{t),Go)) w{t)]^^, we have 


{M{ho, Go))-^ r(/) = r A{t)P^N{t)dt = 

Jo 


(18) 


where A{t)N'^ {t)dt which is a matrix of order p x {kn+m—l). Then in order to approximate the 

posterior distribution of G, it suffices to study the asymptotic posterior distribution of the linear functional 
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of (3 given by (fTSl) . The next theorem describes the approximate posterior distribution of ^/n{6 — Oq). We 
denote the posterior probability measure of two-step method by If**. 

Theorem 3. Let us denote 

= V^Hl{XlXn)-^XlY - V^{M{ho,eo))-^T{fo), 

Sr = nHl{XlXn)-^Hn 

and D = ^k'{'))gjj ■ VD is non-singular, then under the conditions ofLemmaUl 

||nr (v^(0-0o) G ■\X,Y) - N = op,{l). (19) 

As in RKSB and RKTB, we get similar results for two-step Bayesian approach when the regressor is 
non-random under appropriate conditions. 

7 Simulation Study 

We consider the van der Pol equation 

-0(1- +Mt] = 0, (6 |o, 1] 

with the initial conditions fe{0) = 2, fg{0) = 0, to study the posterior distribution of 6 . The above system 
is not analytically solvable. We consider the situation where the true regression function belongs to the 
solution set. For a sample of size n, the predictor variables A"i,..., Xn are drawn from the Uniform(0,1) 
distribution. Samples of sizes 100 and 500 are considered. We simulate 1000 replications for each case. 
Under each replication a sample of size 1000 is drawn from the posterior distribution of 9 using RKSB, 
RKTB and Bayesian two-step methods and then 95% equal tailed credible intervals are obtained. The 
simulation results are summarized in the Tabled Bayesian two-step method is abbreviated as “TS” in the 
table. We calculate the coverage and the average length of the corresponding credible interval over these 
1000 replications. We also compare these three methods with the nonlinear least squares (NLS) technique 
based on exhaustive numerical solution of the ODE where we construct 95% confidence interval using 
asymptotic normality. The estimated standard errors of the interval length and coverage are given inside the 
parentheses in the table. The true parameter vector is chosen as 0o = 1- The true distribution of error is 
taken A^(0, (0.1)^). We put an inverse gamma prior on with shape and scale parameters being 99 and 
1 respectively. For RKSB the prior for 6 is chosen as independent Gaussian distribution with mean 6 and 
variance 16. We take n grid points to obtain the numerical solution of the ODE by RK4 for a sample of size 
n. We take m = 5 and m = 7 for RKTB and Bayesian two-step method respectively. Booking at the order 
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of kn suggested by Theorem |2l kn is chosen as 3 and 4 for n = 100 and n = 500 respectively in RKTB. 
In Bayesian two-step method the choices are 2 and 3 for re = 100 and re = 500 respectively. The weight 
function for TS is chosen as w{t) = 


“Table[2]here” 


Note the similarity in the outputs corresponding to RKTB, RKSB and NLS because of asymptotic efficiency 
while TS intervals are much wider. However, TS is computationally much faster. 

8 Proofs 

We use the operators Eo(-) and Varo(-) to denote expectation and variance with respect to Pq. 


Bhaumik and Ghosall (l2014bh we can argue that there exists a com- 


Proof of Theorem\I} As in Lemma 1 of 
pact subset U of (0, oo) such that n„((T^ £ U\X,Y) ^ 1. Let Y) be the posterior distribution 

conditioned on £ U. By Theorem 2.1 of iKleiin and van der Vaarti (12012h if we can ensure that there exist 
stochastically bounded random variables fin and a positive definite matrix Y such that for every compact 
set K C 


sup 

h&K 


(n) 

P'Y0,n 


0 , 


( 20 ) 


y{n) 


in (outer) Pk Lpj-obability and that for every sequence of constants —)• oo, we have 


Pi"'>nu,n {V^Wl - loW > Mn\X,Y) ^0, 


( 21 ) 


then (v ^(7 - 70 ) G -iX, Y) - N{fin, S) 

hold, we prove results similar to Lemmas 2 to 5 of 

Lemma 2 of 


TV 


0 . To show that the conditions (l20l) and (|2T]) 


Bhaumik and Ghosall (l2014bh . Following the steps of 


Bhaumik and Ghosall (I2014bl) we get Y = 





with 


^00 = I' ^ {flit) ifoit) - fe, 

and fin = YiGn(-'yo,n- Finally we get 

||n„ (v^(7 - 70) e - 1 ^, Y) - N{fin, s)||^^ ^ 0 


0=00 


g{t)dt 


since ||n„ — liu^nW^v ~ *^he result follows. 


□ 
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Proof of Corollary\I\ The log-likelihood of the correctly specified model with Gaussian error is given by 


= -\oga^- ^\Y - fe,{X)\^ + \ogg{X). 

zaQ 


Thus 4-J^o{X,Y) = [fg. 


{Y - fg,{X)) and A^o(X,y) = - A + ^\Y - fe,iX)\f 


Hence, the Fisher information is given by 

^o^fo feoit)feoit)9{t)dt 


Iho) = 


0 


0 

^oV 2 

Thus X) = (J( 7 o))~^ if the regression function is correctly specified and the true error distribution is 

N{0,ai). □ 

Proof of Theorem^ For ff,f3) = we have C {t) 0^ N {t)g{t)dt = G^/3, where 

Gl= C C{t)N^{t)g{t)dt 

whic h is a matrix of order p x (k„ + m — 1). We can derive the posterior cons istency of < 7 ^ similar to Lemm a 


11 of 


Rhaumik and Ghosall (l2014bh . Following result similar to Lemma 9 of 


Rhaumik and Ghosall (l2014bh . 


it can be shown that on a set with high posterior probability 


^(0 - 0 o) - 0^gI(3 - (J(0o)) 


-1 


00 (^) foit)g{t) 


as re —>• oo. Then it suffices fo show fhaf for any neighborhood ^ of cJq, 


sup 


n* 




is opo (1). The resf of fhe proof follows from thaf of Theorem 4.2 of 


Rhaumik and Ghosall (l2014bli . 


TV 


□ 


Proof of Corollary^ The log-likelihood of the correctly specified model is given by 
ie0X,Y) = -logcTo-^|y-/eo(X)|2+logp(X). 


Thus 4o(x,y) = -^ 0-2 (4 


-2 




get 


id (fooi^)) feo{t)g{t)dt. Following the proof 


(y — / 0 (,(X)) and the Fisher information is given by I{Oo) = 
of Lemma 10 of 


Rhaumik and Ghosall (I2014bf) we 




f [fg0t)Y fe0t)g{t)dt ((J( 0 o))"')^ • 


This limit is equal to (/(0o)) ^ under the correct specification of the regression function as well as the error 
model. □ 
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Proof of LemmaUl By the definitions of 0 and 6q we have 

f G{t,h{t),0)w{t)dt = 0, f G{t,hQ{t),Oo)w{t)dt = 0 . 

Jo Jo 

Subtracting the second equation from the first and applying the Mean-value Theorem, we get 

f D 0 ^{G{t,ho{t),Oo))w{t)dt{e - Oo) + f Dh^ {G{t,ho{t),Oo)) w{t){h{t) - ho{t))dt 
Jo Jo 

+0 ( sup \\h{t) - fj-oWlM + 0 {\\e - 0 o||^) = 0 . 

Vt6[0,l] J 

Now we will show that {G{t, ho{t), Oq)) w{t){h{t) — ho{t))dt is a linear functional of / — /q. Note 

that Jq DfiQ (Gf, hof), 9 q)) w{t){h{t) — hoffdt can be written as 

^ f [Dho{G{t,ho{t),eo))w{t)]^^(^f^''\t,P) - flf\t)J dt. 

r=0 

We shall show that every term of this sum is a linear functional of / — /q. We observe that for r = 0,..., q 
/* ^ 

[Dho iGit,ho{t),eo))w{t)]^^ (^f^^\t,P) - fo\t)^ dt 
= ^[Dho{G{t,hoit),9o))w{t)]^^{f{t,p) - fo{t))dt 

using integration by parts and the fact that the function w{-) and its first (g — 1 ) derivatives vanish at 0 and 
1. Proceeding this way we get 

M{ho, 9 o){ 9 - 9 o)-T{f-fo) + ol sup ||Mi) - ^oWf) + O (||0 - ^of) = 0 . 

Vie [0.1] / 

Let En = {{h,9) : sup^g^o,!] - ^n(f)ll < Cn, 11^ - 


the proofs of Lemmas 2 and 3 of 


< En}, where e„ 0. Using the steps of 


Rhaumik and Ghosall (l2014ah . we can prove the posterior consistency of 


9. Hence, there exists a sequence {en} so that Yi{E{fY) = opq(1). Hence, on En 

y/^{9 - 9o) = {{M{ho,9o))~^ + o{l)) y/nT{f - fo) + ^/n sup \\h{t) - ho{t)f 0(1). 

i6[0,l] 


By result similar to Lemma 4 of 


Rhaumik and Ghosall (l2014ah . 


\/nT( f — fn) assigns most of its mass inside 


a large compact set. Now using result similar to Lemma 2 of iRhaiimik and Ghosall (I2014ah . we can assert 
that on En, the second term on the display is o(l) and the conclusion follows. □ 


13 


























Proof of Theorem^ By Lemma[T]and ([T^ . it suffices to show that for any cr^ in a neighborhood of ctq, 


n (M(/io, Oo)r^ r(/o) e -jx, x) 


=opo(l)- 


TV 


( 22 ) 


Note that the posterior distribution of J/3 is a normal distribution with mean vector 


Hi{xlXn + n-^KIk„+m-i) 'xlY 

and dispersion matrix 

(-^n “f ^ ^n-^fcn+m—l) Hji 


respectively. We calculate the Kullback-Leibler divergence between two Gaussian distributions and show 
that it conver ges in Pn-probability to zero t o prove the assertion. The rest of the proof is similar to that of 


Theorem 3 of 


Rhaumik and Ghosall (l2014al) . 


□ 
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Table 1: Arguments of H 


p 

1 

2 

3 

4 


^e,r„(afc) 

'^0,rn (®fc) 

^0,r„(afc) + 

^0,r„(afc) + 





'^els<^k) + ^^els<^k) 





'^e^^\<^k) + 7;:^0~!\ak) 

+ 2 ^^fc?Vfc) 

+ 4^^fcn^Vfc) 

[/g -2 


^e,7^\^k) 

^e^!\(^k) + ^^e^^\ak) 

'^e^!\^k) + ^^e^^\ak) 

J79-1 

'^e^^\<^k) 

^e,r^\^k) 

+^^e,r!\ak) 

^07n^^(afc) + 

+ 4^^1 

'^0,r^\(^k) + 

+ 2^^2 

W 

‘^e^^\<^k) 

^e,r,^\^k) 


'^0,r!\(^k) + ^^3 
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Table 2: Coverages and average lengths of the Bayesian credible intervals and confidence intervals 


n 


RKTB 

RKSB 

TS 


NLS 




coverage 

length 

coverage 

length 

coverage 

length 

coverage 

length 



(se) 

(se) 

(se) 

(se) 

(se) 

(se) 

(se) 

(se) 

100 

e 

95.6 

0.34 

94.7 

0.33 

94.9 

1.95 

95.2 

0.32 



(0.02) 

(0.06) 

(0.02) 

(0.04) 

(0.02) 

(0.71) 

(0.02) 

(0.03) 

500 

e 

95.7 

0.14 

95.4 

0.14 

96.7 

0.70 

95.1 

0.14 



(0.01) 

(0.01) 

(0.01) 

(0.01) 

(0.01) 

(0.13) 

(0.01) 

(0.01) 
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